Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters
نویسندگان
چکیده
This paper describes a novel framework for statistical parametric speech synthesis in which statistical modeling of the speech waveform is performed through the joint estimation of acoustic and excitation model parameters. The proposed method combines extraction of spectral parameters, considered as hidden variables, and excitation signal modeling in a fashion similar to factor analyzed trajectory hidden Markov model. The resulting joint model can be interpreted as a waveform level closed-loop training, where the distance between natural and synthesized speech is minimized. An algorithm based on the maximum likelihood criterion is introduced to train the proposed joint model and some experiments are presented to show its effectiveness.
منابع مشابه
Reducing Mismatch in Training of DNN-Based Glottal Excitation Models in a Statistical Parametric Text-to-Speech System
Neural network-based models that generate glottal excitation waveforms from acoustic features have been found to give improved quality in statistical parametric speech synthesis. Until now, however, these models have been trained separately from the acoustic model. This creates mismatch between training and synthesis, as the synthesized acoustic features used for the excitation model input diff...
متن کاملA novel irregular voice model for HMM-based speech synthesis
State-of-the-art text-to-speech (TTS) synthesis is often based on statistical parametric methods. Particular attention is paid to hidden Markov model (HMM) based text-to-speech synthesis. HMM-TTS is optimized for ideal voices and may not produce high quality synthesized speech with voices having frequent non-ideal phonation. Such a voice quality is irregular phonation (also called as glottaliza...
متن کاملStatistical parametric speech synthesis with a novel codebook-based excitation model
Speech synthesis is an important modality in Cognitive Infocommunications, which is the intersection of informatics and cognitive sciences. Statistical parametric methods have gained importance in speech synthesis recently. The speech signal is decomposed to parameters and later restored from them. The decomposition is implemented by speech coders. We apply a novel codebook-based speech coding ...
متن کاملDeep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model
This paper proposes a deep neural network (DNN)-based statistical parametric speech synthesis system using an improved time-frequency trajectory excitation (ITFTE) model. The ITFTE model, which efficiently reduces the parametric redundancy of a TFTE model, improved the perceptual quality of the vocoding process and the estimation accuracy of the training process. However, there remain problems ...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کامل